Configurable generic filter hardware block and methods

ABSTRACT

A configurable generic filter hardware block and corresponding methods are provided. A configurable generic filter hardware block comprises a plurality of multipliers; a plurality of adders; and one or more multiplexers, wherein the configurable generic filter hardware block is configured using a header data structure, the header data structure comprises a pointer to a memory location storing a plurality of input samples, a pointer to a memory location storing a plurality of output samples and a coefficient selection control value. The configurable generic filter hardware block is optionally invoked by a convolution instruction in one or more of a vector processor and a state machine. An exemplary Generic Filter Iteration comprises loading input samples; selecting coefficients; convolving the input samples and the selected coefficients and storing output samples. Each Generic Filter Iteration has a corresponding header data structure. The header data structures are optionally stored sequentially in memory and processed in a single loop.

CROSS-REFERENCE TO RELATED APPLICATION

The present application is related to U.S. patent application Ser. No.13/701,376, filed Apr. 24, 2013, entitled “Vector Processor HavingInstruction Set With Vector Convolution Function For FIR Filtering,”which claims priority to International Patent Application Serial No.PCT/US 12/62182, entitled “Vector Processor Having Instruction Set WithVector Convolution Function For FIR Filtering,” and U.S. PatentProvisional Application Ser. No. 61/552,242, filed Oct. 27, 2011,entitled “Software Digital Front End (SoftDFE) Signal Processing andDigital Radio,” each incorporated by reference herein.

FIELD OF THE INVENTION

The present invention is related to digital signal processing techniquesand, more particularly, to techniques for digital filtering.

BACKGROUND

Finite Impulse Response (FIR) digital filtering is used in many signalprocessing applications. A data stream is often filtered in multiplestages, with the output of one filter serving as the input of anotherfilter. In addition, a plurality of such data streams may need to beprocessed in parallel. A typical filtering operation typically comprisesa processing loop where the input data stream and filter coefficientsare read, and convolved to produce the output data.

Filtering performance can be improved to meet higher throughputrequirements, for example, by increasing the clock frequency of theprocessing hardware. The clock frequency, however, will be constrainedby a physical limit. The filtering performance can also be improved, forexample, using a vector approach to exploit data-level parallelism ofthe filtering operation. A vector approach processes multiple samples(i.e., a vector) in one cycle by adding additional parallel hardware(e.g., multipliers and accumulators). A vector approach is possible ifthe data and coefficients are stored in contiguous locations.

For each filter, the processing loop has a fixed overhead in the numberof cycles before the loop attains steady state, which increases with thedepth of the pipeline. Also, with a vector approach, this fixed overheadbecomes a higher fraction of the overall processing cycles. Thus, for aplurality of filters, each filter requires a separate processing loop,and the overall processing efficiency decreases. In applicationsrequiring low latency, e.g., a Digital Front End (DFE) of a wirelessbase station, the input block sizes are small, so the number ofiterations in each loop is also fairly small. Thus, the number of loopoverhead cycles is comparable to (or sometimes greater than) the actualnumber of processing cycles.

A need therefore exists for filtering techniques that improve processingefficiency by reducing the number of loops. A further need exists forfiltering techniques that improve processing efficiency by reducing thenumber of loops to a single loop.

SUMMARY

Generally, a configurable generic filter hardware block andcorresponding methods are provided that improve processing efficiency byreducing the number of loops. According to one aspect of the invention,a configurable generic filter hardware block comprises a plurality ofmultipliers; a plurality of adders; and one or more multiplexers,wherein the configurable generic filter hardware block is configuredusing a header data structure, the header data structure comprises apointer to a memory location storing a plurality of input samples, apointer to a memory location storing a plurality of output samples and acoefficient selection control value. The header data structure alsocomprises, for example, an accumulation control value and/or aninput/output data selection value.

In one exemplary embodiment, the configurable generic filter hardwareblock is invoked by a convolution instruction in one or more of a vectorprocessor and a state machine. An exemplary Generic Filter Iterationcomprises loading a plurality of input samples; selecting a plurality ofcoefficients; convolving the plurality of input samples and theplurality of selected coefficients and storing a plurality of outputsamples. Each of the exemplary Generic Filter Iterations has acorresponding header data structure. The header data structures areoptionally stored sequentially in memory and processed in a single loop.The header data structures are optionally sequenced to reducedependencies and can be precomputed off-line.

A more complete understanding of the present invention, as well asfurther features and advantages of the present invention, will beobtained by reference to the following detailed description anddrawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic block diagram of a portion of a conventionalDigital Front End (DFE) of a wireless base station;

FIG. 2 illustrates a conventional filtering technique where filteroperations for a plurality of filters 1 through n are performed;

FIG. 3 illustrates a plurality of filter operations performed in asingle processing loop in accordance with an exemplary embodiment of theinvention;

FIG. 4 is a schematic block diagram illustrating a configurable genericfilter hardware block in accordance with the present invention that isconfigured by a filter header;

FIG. 5 illustrates a sequence of headers stored in a header memory; and

FIG. 6 illustrates the configurable generic filter hardware block ofFIG. 4 in further detail.

DETAILED DESCRIPTION

Aspects of the present invention provide a configurable generic filterhardware block and methods for configuring and employing theconfigurable generic filter hardware block. The configurable genericfilter hardware block comprises a plurality of multipliers and addersand one or more multiplexers and is configured using a header datastructure. In one exemplary embodiment, the header data structurecomprises a pointer to an input sample buffer, a pointer to an outputsample buffer and a coefficient selection control value. While thepresent invention is illustrated herein in the context of a DigitalFront End (DFE) of a wireless base station, and particularly forperforming filtering in a Digital Upconversion stage, the presentinvention is applicable to any filtering applications.

FIG. 1 is a schematic block diagram of a portion of a conventionalDigital Front End (DFE) 100 of a wireless base station. Generally, theportion of the conventional Digital Front End (DFE) 100 corresponds to adigital up-conversion portion of the DFE. As shown in FIG. 1, theexemplary conventional Digital Front End 100 comprises an upsamplingfilter stage 110 and an up-conversion stage 120. The exemplaryupsampling filter stage 110 comprises a first plurality of filters L11 .. . L15 for processing a first carrier signal, a second plurality offilters L21 . . . L25 for processing a second carrier signal, a thirdplurality of filters L31 . . . L34 for processing a third carriersignal, and a fourth plurality of filters L41 . . . L43 for processing afourth carrier signal. For example, the exemplary first carrier signalreceived from baseband (not shown) has a frequency of, for example, 3.84MSps for the first carrier signal. After the first carrier signal isprocessed by three filters L11-L13, the first carrier signal has afrequency of 30.72 MSps, and after the first carrier signal is processedby filters L41-L43, the upsampled first carrier signal has a frequencyof 122.88 MSps.

The exemplary upconversion stage 120 comprises a plurality ofmultipliers 125-1 through 125-4 for multiplying each upsampled carriersignal by a local oscillator (LO) signal and an adder 128 to combine theoutputs of the multipliers 125-1 through 125-4 to produce a combinedcarrier signal having a frequency of 122.88 MSps. The output of theadder 128 is optionally provided to a crest factor reduction (CFR) stage(not shown).

FIG. 2 illustrates a conventional filtering technique where filteroperations for a plurality of filters 1 through n are performed. Asshown in FIG. 2, each filter i has a corresponding processing loop whereone or more filter operations are performed. Each filter operation(vec_conv) comprises convolving the input data stream with filtercoefficients to produce output data. For each filter i, the processingloop has a fixed overhead in the number of cycles before the loopattains steady state, which typically increases with the depth of thepipeline. The overhead incurred for each filter impairs overallperformance.

Aspects of the invention provide a configurable generic filter hardwareblock and associated methods that improve processing efficiency byreducing the number of loops. In one exemplary embodiment, processingefficiency is improved by reducing the number of loops to a single loop.As noted above, the configurable generic filter hardware block isconfigured using a header data structure. In one exemplary embodiment,the header data structure comprises a pointer to an input sample buffer,a pointer to an output sample buffer and a coefficient selection controlvalue.

Additional aspects of the present invention extend conventional vectorprocessors to provide an enhanced instruction set that supports vectorconvolution functions. A vector processor in accordance with exemplaryaspects of the present invention receives an input vector having real orcomplex inputs, applies a complex vector convolution function to theinput and generates a vector having one output value for each timeshift. A vector convolve software instruction keyword is optionally partof an instruction set of a vector processor and/or a state machine.

According to another aspect of the invention, a Generic Filter Iterationroutine is provided to configure the configurable generic filterhardware block based on a corresponding header data structure for agiven filter operation. The exemplary Generic Filter Iteration routinecomprises the steps of (i) loading a plurality of input samples based onthe pointer to the input sample buffer identified in the header datastructure; (ii) selecting a plurality of coefficients based on thecoefficient selection control value in the header data structure; (iii)convolving the plurality of input samples and the plurality of selectedcoefficients (optionally performed multiple times with accumulation, ifneeded); and (iv) storing a plurality of output samples based on thepointer to the output sample buffer identified in the header datastructure. In one exemplary embodiment the plurality of coefficients areselected by loading the plurality of coefficients from a memory based onthe coefficient selection control value in the header data structures.The filter state is also optionally stored after each exemplary GenericFilter Iteration. For example, the last L-1 input samples from the prioriteration are needed for the next iteration, where L is the length ofthe filter.

The convolve operation in the exemplary Generic Filter Iteration routineis optionally performed in response to the vector convolve softwareinstruction keyword to invoke the configurable generic filter hardwareblock. Generally, if a vector processor is processing software code thatincludes a predefined instruction keyword corresponding to a vectorconvolution function and the appropriate operands for the function(i.e., the input samples), the instruction decoder must invoke theconfigurable generic filter hardware block to process the vectorconvolution instruction.

FIG. 3 illustrates a plurality of filter operations performed in asingle processing loop 300 in accordance with an exemplary embodiment ofthe invention. Generally, the single processing loop 300 combines all ofthe filter operations (vec_conv) using a set of the headers, asdiscussed further below in conjunction with FIG. 4. In this manner, thefixed overhead for the single loop is incurred only once, compared tothe conventional approach of FIG. 2.

FIG. 4 is a schematic block diagram illustrating a configurable genericfilter hardware block 600 in accordance with the present invention thatis configured by a filter header 410. The configurable generic filterhardware block 600, discussed further below in conjunction with FIG. 6,is configurable, in terms of input/output selection, formatting, anddata type selection (real/complex). The behavior of the configurablegeneric filter hardware block 600 is customized based on the set ofcontrol/configuration inputs in the filter header 410.

As shown in FIG. 4, the exemplary filter header 410 comprises aconfiguration and control information field 412 (e.g., (accumulationcontrol and input/output data selection); a coefficient selectioncontrol value 414; a pointer 416 to an output sample buffer 422 ofmemory 420 and a pointer 418 to an input sample buffer 426 of memory420. As indicated above, the filter header 410 defines how theconfigurable generic filter hardware block 600 is configured for oneexemplary Generic Filter Iteration.

As shown in FIG. 4, the coefficient selection control value 414 may beused as an index into a set 440 of filter coefficients. In a furthervariation, the coefficient selection control value 414 can serve as apointer to obtain the filter coefficients from memory.

The output samples 458 of the configurable generic filter hardware block600 are normally stored in the output sample buffer 422, as identifiedby pointer 416. In an accumulation mode, however, the output 455 of theconfigurable generic filter hardware block 600 is fed back to one ormore accumulators 430.

The filter header 410 are optionally precomputed for each call of theGeneric Filter Iteration, which can optionally be done offline, andstored in memory. In this manner, the processing efficiency isincreased, without increasing the hardware complexity, since the headersare precomputed offline and stored in memory.

As indicated above, the configurable generic filter hardware block 600can be used as an instruction “vec_cnv_filt” in a vector processor.

FIG. 5 illustrates a sequence of headers hdr 1-N stored in a headermemory 510. The sequence of headers 1-N are optionally processed insequence in a loop 520. As shown in FIG. 5, each header hdr has the sameformat as the filter header 410 of FIG. 4 with a pointer 416 to anoutput sample buffer 422 of memory 420 and a pointer 418 to an inputsample buffer 426 of memory 420.

FIG. 6 illustrates the configurable generic filter hardware block 600 ofFIG. 4 in further detail. Generally, as indicated above, theconfigurable generic filter hardware block 600 comprises a plurality ofmultipliers and adders and one or more multiplexers. In addition, theconfigurable generic filter hardware block 600 is configured using thefilter header 410 of FIG. 4.

As shown in FIG. 6, the configuration and control information field 412(e.g., (accumulation control and input/output data selection) is appliedto an output sample packing block 660. The input samples 610 are appliedto an input sample selection block 625 and a symmetric sample selectionblock 630. The input sample selection block 625 selects the inputsamples 610 based on the input sample pointer 418. The symmetric sampleselection block 630 optionally exploits filter symmetry. The selectedinput samples are applied to adders 640 and the output of adders 640 areapplied to a multiplier array and reduction trees block 650.

The filter coefficients 615 are applied to a coefficient selection andreplication block 635. The coefficient selection and replication block635 selects the filter coefficients based on the coefficient selectioncontrol value 414. The selected filter coefficients are applied to themultiplier array and reduction trees block 650. The output of themultiplier array and reduction trees block 650 is applied to an adder655.

In an accumulation mode, input accumulators 620, which are obtained fromaccumulator registers 670 are applied to an input of a multiplexer 645.The multiplexer 645 selects an input based on the accumulation controlin the configuration and control information field 412.

As shown in FIG. 6, the output of adder 655 is applied to a shift, roundsaturate block 665 and as the accumulator output 680. The output of theshift, round saturate block 665 is applied to the output sample packingblock 660 which produces output samples 675.

Thus, aspects of the present invention recognize that any filter can beimplemented as a sequence of multiple calls of the configurable genericfilter hardware block 600. A unique filter header 410 is computed foreach call of the exemplary Generic Filter Iteration routine. Each filterheader 410 is optionally pre-computed offline. Each iteration invokesthe configurable generic filter hardware block 600. The headers 410 areoptionally stored sequentially in memory and processed in a single loop,which is optionally software pipelined.

The headers 410 of different carriers and/or filters are optionallyarranged in an appropriate sequence to avoid dependencies. In thismanner, pipeline stalls due to filter input/output dependencies can beminimized, if not eliminated. For example, in the exemplary conventionalDigital Front End 100 of FIG. 1, the plurality of filters for a givencarrier depend on the output of the prior filter in the sequence. Forexample, filter L12 depends on the output of filter L11. Thus, asubsequent filter of a given carrier cannot begin processing until theprocessing for a prior filter in the chain has completed. The filtersfor one carrier, however, do not depend on the output of the filters fora different carrier. For example, filters L11 through L15 do not dependon the output of filters L21 through L25.

Thus, another aspect of the invention sequences the header datastructures to reduce dependencies. For example, the first filter foreach carrier can be processed in sequence, before the second filter foreach carrier is processed.

CONCLUSION

While exemplary embodiments of the present invention have been describedwith respect to digital logic blocks and memory tables within a digitalprocessor, as would be apparent to one skilled in the art, variousfunctions may be implemented in the digital domain as processing stepsin a software program, in hardware by circuit elements or statemachines, or in combination of both software and hardware. Such softwaremay be employed in, for example, a digital signal processor, applicationspecific integrated circuit or micro-controller. Such hardware andsoftware may be embodied within circuits implemented within anintegrated circuit.

Thus, the functions of the present invention can be embodied in the formof methods and apparatuses for practicing those methods. One or moreaspects of the present invention can be embodied in the form of programcode, for example, whether stored in a storage medium, loaded intoand/or executed by a machine, wherein, when the program code is loadedinto and executed by a machine, such as a processor, the machine becomesan apparatus for practicing the invention. When implemented on ageneral-purpose processor, the program code segments combine with theprocessor to provide a device that operates analogously to specificlogic circuits. The invention can also be implemented in one or more ofan integrated circuit, a digital processor, a microprocessor, and amicro-controller.

It is to be understood that the embodiments and variations shown anddescribed herein are merely illustrative of the principles of thisinvention and that various modifications may be implemented by thoseskilled in the art without departing from the scope and spirit of theinvention.

What is claimed:
 1. A configurable generic filter hardware block,comprising: a plurality of multipliers; a plurality of adders; and oneor more multiplexers, wherein said configurable generic filter hardwareblock is configured using a header data structure, said header datastructure comprises a pointer to a memory location storing a pluralityof input samples, a pointer to a memory location storing a plurality ofoutput samples and a coefficient selection control value.
 2. Theconfigurable generic filter hardware block of claim 1, wherein saidheader data structure further comprises one or more of an accumulationcontrol value and an input/output data selection value.
 3. Theconfigurable generic filter hardware block of claim 1, wherein saidconfigurable generic filter hardware block is invoked by a convolutioninstruction in one or more of a vector processor and a state machine. 4.The configurable generic filter hardware block of claim 1, furthercomprising a Generic Filter Iteration to perform the following steps:loading said plurality of input samples; selecting a plurality ofcoefficients; convolving said plurality of input samples and saidplurality of selected coefficients and storing a plurality of outputsamples.
 5. The configurable generic filter hardware block of claim 4,wherein said configurable generic filter hardware block is invoked bysaid convolving step in said Generic Filter Iteration a plurality oftimes in a single loop for a plurality of filters.
 6. The configurablegeneric filter hardware block of claim 4, wherein each of said GenericFilter Iterations has a corresponding one of said header datastructures.
 7. The configurable generic filter hardware block of claim6, wherein said header data structures are stored sequentially in memoryand processed in a single loop.
 8. The configurable generic filterhardware block of claim 7, wherein said header data structures aresequenced to reduce dependencies.
 9. The configurable generic filterhardware block of claim 6, wherein said header data structures areprecomputed off-line.
 10. A method for performing a plurality of filteroperations, said method comprising: providing a configurable genericfilter hardware block comprising a plurality of multipliers, a pluralityof adders and one or more multiplexers, wherein said configurablegeneric filter hardware block is configured using a header datastructure, wherein said header data structure comprises a pointer to amemory location storing a plurality of input samples, a pointer to amemory location storing a plurality of output samples and a coefficientselection control value; performing a Generic Filter Iteration routinecomprising the following steps to configure said configurable genericfilter hardware block based on one of said header data structures for agiven one of said plurality of filter operations: loading a plurality ofinput samples based on said one of said header data structures;selecting a plurality of coefficients based on said one of said headerdata structures; convolving said plurality of input samples and saidplurality of selected coefficients; and storing a plurality of outputsamples based on said pointer to a memory location storing a pluralityof output samples in said one of said header data structures.
 11. Themethod of claim 10, wherein said step of selecting a plurality ofcoefficients further comprises the step of loading said plurality ofcoefficients from a memory based on said one of said header datastructures.
 12. The method of claim 10, wherein said convolving step isperformed in response to a vector convolve software instruction keywordto invoke said configurable generic filter hardware block.
 13. Themethod of claim 12, wherein said vector convolve software instructionkeyword is part of an instruction set of one or more of a vectorprocessor and a state machine.
 14. The method of claim 10, wherein saidmethod is invoked by said convolving step in said Generic FilterIteration a plurality of times in a single loop for a plurality offilters.
 15. The method of claim 10, wherein each of said Generic FilterIterations has a corresponding one of said header data structures. 16.The method of claim 15, wherein said header data structures are storedsequentially in memory and processed in a single loop.
 17. An apparatusthat performs a plurality of filter operations, comprising: aconfigurable generic filter hardware block comprising a plurality ofmultipliers, a plurality of adders and one or more multiplexers, whereinsaid configurable generic filter hardware block is configured using aheader data structure; at least one hardware device operative to:perform a Generic Filter Iteration routine comprising the followingsteps to configure said configurable generic filter hardware block basedon one of said header data structures for a given one of said pluralityof filter operations: load a plurality of input samples based on saidone of said header data structures; select a plurality of coefficientsbased on said one of said header data structures; convolve saidplurality of input samples and said plurality of selected coefficients;and store a plurality of output samples based on said pointer to amemory location storing a plurality of output samples in said one ofsaid header data structures.
 18. The apparatus of claim 17, wherein saidstep of selecting a plurality of coefficients further comprises the stepof loading said plurality of coefficients from a memory based on saidone of said header data structures.
 19. The apparatus of claim 17,wherein said convolve operation is performed in response to a vectorconvolve software instruction keyword to invoke said configurablegeneric filter hardware block.
 20. The apparatus of claim 17, whereinsaid at least one hardware device comprises one or more of a vectorprocessor and a state machine.
 21. The apparatus of claim 17, whereineach of said Generic Filter Iterations has a corresponding one of saidheader data structures.
 22. The apparatus of claim 21, wherein saidheader data structures are stored sequentially in memory and processedin a single loop.